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Abstract Growth curve models are widely used in social 
and behavioral sciences. However, typical growth curve 
models often assume that the errors are normally distributed 
although non-normal data may be even more common than 
normal data. In order to avoid possible statistical inference 
problems in blindly assuming normality, a general Bayesian 
framework is proposed to flexibly model normal and non¬ 
normal data through the explicit specification of the error 
distributions. A simulation study shows when the distribu¬ 
tion of the error is correctly specified, one can avoid the 
loss in the efficiency of standard error estimates. A real 
example on the analysis of mathematical ability growth data 
from the Early Childhood Longitudinal Study, Kindergarten 
Class of 1998-99 is used to show the application of the pro¬ 
posed methods. Instructions and code on how to conduct 
growth curve analysis with both normal and non-normal 
error distributions using the the MCMC procedure of SAS 
are provided. 

Keywords Growth curve models ■ Bayesian estimation • 
Non-normal data ■ r-distribution • Exponential power 
distribution • Skew normal distribution ■ SAS PROC 
MCMC 

Introduction 

In the past half century, growth curve models (GCMs) have 
evolved from fitting a single curve of one individual to 
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fitting multilevel or mixed-effects models and from lin¬ 
ear to nonlinear models (e.g., McArdle, 2000; McArdle & 
Nesselroade, 2002; Preacher et al., 2008; Tucker, 1958). The 
application of GCMs in social and behavioral research has 
grown rapidly since Meredith and Tisak (1990) showed that 
such models can be fitted as a restricted common factor 
model in the structural equation modeling framework (see 
also, McArdle (1986) and McArdle and Epstein (1987)). 
Nowadays, GCMs are widely used in analyzing longitudinal 
data to study change (e.g., Laird & Ware, 1982; McArdle & 
Nesselroade, 2003; Meredith & Tisak, 1990; Zhang et al., 
2007). For a more comprehensive discussion of GCMs, see 
(McArdle & Nesselroade, 2002). 

To estimate GCMs, the maximum likelihood estima¬ 
tion (MLE) method is commonly used (e.g., Demidenko, 
2004; Laird & Ware, 1982). MLE is available in both 
commercial software, such as SAS PROC MIXED, and 
free software such as the R package lme4. MLE typically 
assumes that the errors of GCMs are normally distributed. 
However, in reality, data are more likely to be non-normally 
than normally distributed (Micceri, 1989). The violation of 
the normality assumption may result in unreliable param¬ 
eter estimates, incorrect standard errors and confidence 
intervals, and misleading statistical tests and inference 
(Maronna et ah, 2006; Yuan et ah, 2004; Zu & Yuan, 2010). 

Recently, Bayesian methods have become very popu¬ 
lar in the social and behavioral sciences largely advanced 
by the work of Lee and colleagues (e.g., Lee, 2007; Song 
& Lee, 2012). Nowadays, Bayesian methods have also 
received significant attention as useful tools for estimating 
a variety of models including GCMs, especially complex 
GCMs that can be difficult or impossible to estimate in 
the current MLE framework or in MLE based software 
(e.g., Chow et ah, 2011; Muthen & Asparouhov, 2012; 
Song et ah, 2009, 2011; Wang & McArdle, 2008; Zhang 
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et al., 2007). For example. Song et al. (2009) proposed a 
novel latent curve model for studying the dynamic change 
of multivariate manifest and latent variables and their 
linear and interaction relationships. Wang and McArdle 
(2008) applied the Bayesian method to estimate a GCM 
with unknown change points. Zhang et al. (2007) dis¬ 
cussed how to integrate information in growth curve mod¬ 
eling through informative priors. There are also Bayesian 
studies to analyze non-normal longitudinal data using 
GCMs. For example, Wang et al. (2008) developed Tobit 
GCMs to deal with limited response data. Zhang et al. 
(2013) extended GCMs to allow the errors to follow a 
f-distribution. 

The Bayesian estimation for GCMs has been gener¬ 
ally implemented in the specialized software WinBUGS. 
Although WinBUGS is flexible to estimate a wide range 
of models, it is unfamiliar to a lot of researchers and can 
be difficult to use. Since the version 9.2, the SAS institute 
released a Markov chain Monte Carlo (MCMC) proce¬ 
dure to implement a general Bayesian estimation procedure 
through the adaptive block random-walk Metropolis algo¬ 
rithm with normal proposal distributions (SAS, 2009). The 
MCMC procedure has the potential to become popular 
because of many important advantages, a few given here. 
First, SAS already has a large user base who can under¬ 
stand and learn the use of the MCMC procedure relatively 
easily. Second, the data manipulation procedure available in 
the SAS data step can be used with the MCMC procedure 
seamlessly. Third, specifically for the growth curve analysis, 
the use of the MCMC procedure is similar to the use of the 
existing NLMIXED procedure. Fourth, it is easy to define 
new prior and data distributions in the MCMC procedure. 
Currently, the use of the MCMC procedure is still rare, not 
to say its application in growth curve modeling. One excep¬ 
tion is Zhang (2013) that provides SAS code for the MCMC 
procedure for growth curve modeling but does not give any 
instruction on its use. 

Therefore, this study serves two main purposes. First, 
we will present a general framework to flexibly model the 
error distributions of GCMs. Second, we will demonstrate 
how to estimate the models using the MCMC procedure 
with instructions. In the following, we will first discuss 
how to model different types of error distributions. Specif¬ 
ically, we will focus on how to use Bayesian methods 
to estimate GCMs with the normal, t, exponential power, 
and skew-normal errors. Then, we will demonstrate how 
to practically apply the method through the analysis of 
data from the Early Childhood Longitudinal Study, Kinder¬ 
garten Class of 1998-99 (ECLS-K). Finally, we will evaluate 
the influence of distribution misspecification through a 
simulation study. 


Growth curve models with different types of error 
distributions 

A typical GCM can be written as a two-level model. The 
first level involves fitting a growth curve with random coef¬ 
ficients for each individual and the second level models the 
random coefficients. Specifically, the first level of a GCM 
can be written as, 

yn = fitimen , b,-) + e,- f , t = 1,..., T, i = 1,-N, 

where y, t is the observed response for individual i at trial or 
occasion t, time,, denotes the measurement time of the rth 
measurement occasion on the ith individual, f (timet, . b,) 
is a function of time representing the growth curve with 
individual coefficients b, = (bn, b, 2 , ■ ■. ,bi p )', a p x 
1 vector of random coefficients, and et, is the residual 
or error. 

The function f (timet,, b/) can take either a linear or a 
nonlinear form of the time variable. The function / deter¬ 
mines the growth trajectory representing how an individual 
changes over time. For example, if f (timet,, b,) = /;,o + 
bt\t with timett = f and b; = (bj o, bn)', it represents a lin¬ 
ear growth trajectory. If f (timet,, b/) = bto + but + bnt 2 
with timet, = t and b, = (bto, bn, bn)', it represents a 
quadratic growth trajectory. 

The second level of a GCM can be expressed as, 

b t = P + u, 

where /? is a vector of fixed-effects that can be seen as 
the average of the random effects b, without covariates in 
the second level, and u, represents, typically, the multi¬ 
variate normal errors with E(ut) = 0 /)X I and Cov(u,) = 
D pxp . The u, is also called the random effects from a 
mixed-effects model perspective. 

For growth curve analysis, it is typically assumed that et, 
is normally distributed with mean 0 and variance a 2 . How¬ 
ever, the normal distribution often lacks the flexibility to 
deal with complexity of real data. For example, the normal 
distribution is symmetric but data in practice can be skewed. 
Furthermore, it has been shown that the normal distribution 
is not robust to outliers or data with heavy tails (Yuan and 
Zhang, 2012b). If a normal distribution is mis-specified for 
a non-normal distribution, the parameter estimator will not 
be as efficient. 

In this study, we propose a general framework to model 
the distribution of et, explicitly and flexibly. The et, can be 
assumed to have the general form p(et,\t() where its exact 
form can be decided from data. The distribution parameter 
£ can typically be estimated within a GCM. To be consis¬ 
tent with the growth curve analysis literature, the mean of 
et, is assumed to be 0. To determine the distribution of et,. 
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individual growth curve analysis can be utilized; to estimate 
the model parameters including /?, D, a~, and £, Bayesian 
estimation methods can be applied. 

Individual growth curve analysis 

To explore the distribution of e, r , the method based on 
the individual growth curve analysis discussed by Tong 
and Zhang (2012) can be applied. Using the method, an 
individual growth curve 

yu = / {time it , b, ) + e it , t = 1, ..., T 

is first fitted to data from each individual through the least 
squares method. The individual coefficients b, and the error 
ett can be estimated and retained. Specifically, we have 

en = yu - f (time it , b;) • 

With the estimated errors, one can first conduct a 
D’Agostino skewness test (D’Agostino, 1970) on the errors. 
The D’Agostino skewness test evaluates whether a distri¬ 
bution is symmetric or not. If the test rejects the symmetry 
of the distribution, a skew distribution is desired. One then 
conducts an Anscombe test (Anscombe & Glynn, 1983) on 
the kurtosis. The Anscombe test evaluates whether the kur- 
tosis of the error is different from the normal kurtosis. If the 
test rejects the normal kurtosis, a distribution with excess 
kurtosis is then needed. 

In addition to the summary statistics on skewness and 
kurtosis, more detailed distributional information can be 
observed through the histogram or density of eit- Based 
on the information, one can decide on one or more can¬ 
didate error distributions. If more than one candidate dis¬ 
tribution can be used, the Kolmogorov-Smirnov test (e.g., 
Chakravarti et al., 1967) can also be used to select the 
best one. One can also try out all candidate distributions 
for comparison. To ease the procedure, an online pro¬ 
gram has been developed to assist the selection of error 
distribution. The URL to the program is http://psychstat.org/ 
gcmdiag. 

Two issues need to be emphasized here. First, the indi¬ 
vidual analysis method assumes that the growth function 
is already known a priori. The following strategy can be 
used to assist the choice of the growth function. If a growth 
function is supported theoretically or empirically, it should 
be used. Otherwise, exploratory techniques can be used to 
assist the selection of a growth function. For example, both 
the individual and mean growth trajectories from the data 
can be plotted. Based on the plot, one can decide whether 
a trajectory is linear or nonlinear. With the linear trajectory, 
a linear function can be used. For the nonlinear trajectory, 
it might also suggest what kind of nonlinear form fits best. 


There are situations where several nonlinear forms can be 
applied. In the situations, one might try all the nonlinear 
functions or select the one with the simplest form. Note that 
in studying the trajectory, both the mean and individual tra¬ 
jectories should be considered because the mean trajectory 
is often influenced by the non-normality of data. Second, 
even when the growth function is known, the population 
error distribution is often impossible to know. The individ¬ 
ual analysis method aims to best approach the population 
distribution through the empirical data. In addition, one 
would prefer a distribution that is less affected by potential 
misspecifications. 

Bayesian estimation 

General introduction on Bayesian analysis can be found in 
both introductory books (e.g., Kruschke, 201 1) and special¬ 
ized ones (e.g., Lee, 2007; Song & Lee, 2012. For a GCM 
with the error distribution p{eu | £), the joint posterior distri¬ 
bution of the random effects b, and model parameters based 
on the data augmentation algorithm (Tanner & Wong, 1987) 
is 

p(P, D, £,b;|y;r, time it , i = 1. N, t = 1. T) 

<x p()3, Daniil p(b,-|/J,D)n,=i [p(yn\f (timeit, bt), ?)]} 

= pGS.D.oniLi ex P [-j(bi -/»)]) 

x Flili n,=i [p(yu\f (timet,, bi), {)] 

( 1 ) 

where p{/}, D, £) is the joint prior distribution for model 
parameters. 

In this study, the uninformative independent priors are 
used for /?, D, and £ such that p(fi, D, £) = p(fi)p(D)p(0 
to minimize the influence of priors. For /L a bivariate nor¬ 
mal prior MN(/3q, So) is used with /? 0 = (0, 0/ and 
E() = IOOOI2. For D, an inverse Wishart prior distribu¬ 
tion IW (mo, Vo) is used with mo = 2 and Yq = I 2 . 
Here, I2 denotes a 2 by 2 identity matrix. These priors 
are chosen so that the density is relatively flat and are 
often viewed as uninformative in the Bayesian literature 
(e.g., Congdon, 2003; Gelman et al., 2003). The prior distri¬ 
bution for £ depends on the form of the error distribution. 

Incorporating the priors above to Eq. 1, the resulting 
posterior is readily available but usually in a very com¬ 
plex form. To evaluate the posterior, the Markov chain 
Monte Carlo (MCMC) methods can be used to generate a 
sequence of random numbers conditionally on consecutive 
draws, formally known as Markov chains, from the poste¬ 
rior distribution, and to construct the parameter estimates 
using the average and standard deviation of the generated 
numbers. In the SAS MCMC procedure, it uses the block 
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Metropolis method that divides the model parameters into 
different blocks and samples each block of parameter suc¬ 
cessively. Specifically, we divide the parameters and the 
random effects into the following blocks - /}, D, £, and b/, 
i = I..... /V. From the joint posterior distribution, we can 
obtain the conditional posterior distribution for each block 
given the rest of parameters. Within each block, the condi¬ 
tion posterior can be obtained and sampled. Details on such 
algorithms can be found in the literature (e.g., Robert & 
Casella, 2004). 

There are two practical issues in applying MCMC meth¬ 
ods. The first is to diagnose the convergence of Markov 
chains. The convergence ensures that the empirical average 
from a Markov chain converges to the mean of the pos¬ 
terior distribution. Brooks and Roberts (1998) and Cowles 
and Carlin (1996) discussed many different methods for 
testing convergence. For example, one can use the Geweke 
test (Geweke, 1992) and/or visually inspect the trace plot of 
Markov chains. The Geweke test calculates a statistic that 
is normally distributed. For a converged Markov chain, the 
absolute value of a Geweke statistic should be smaller than 
1.96 at the 0.05 alpha level. Visual inspection focuses on 
whether there is any unstable pattern in the plot of the gener¬ 
ated Markov chains. Often times, the initial part of a Markov 
chain does not converge to the posterior and is often dis¬ 
carded as the burn-in period. The second is to decide the 
length of Markov chains. By nature, Markov chains always 
have autocorrelation. For two Markov chains, the one with 
greater autocorrelation provides less information about the 
posterior distribution than the one with smaller autocorre¬ 
lation. In other words, a longer Markov chain is needed 
to accurately describe a posterior if its autocorrelation is 
greater. To characterize the information in a Markov chain, 
we use the statistic called effective sample size (ESS). The 
ESS is the equivalent sample size assuming no autocorrela¬ 
tion. For two Markov chains with the same length, the one 
with the greater ESS provides more information. A practi¬ 
cal rule of thumb is to get an ESS at least 400 (Lunn et ah, 
2012 ). 

For a converged Markov chain with sufficient effec¬ 
tive sample size, we can construct the posterior mean, 
posterior standard deviation, and highest posterior den¬ 
sity (HPD; Box & Tiao, 1973) credible interval for each 
model parameter for inference. Let 9 represent a single 
parameter in the vector 0 = (fi. D, £)'. Suppose the con¬ 
verged Markov chain for 0 is 0,-, i = 1 , ,n where n 
is the number of iterations. Then, a point estimate of 9 
can be constructed by the sample mean of the Markov 
chain 


- - 1 .A 

6 =9 = - yet. 

n A 

i=i 


The standard deviation, or the standard error in the frequen- 
tist view, of 9 is given by 

s.d.{9) = - Y{°i-8) 2 - 

n — 1 e —‘' 

i=t 

Credible intervals for 9 can also be constructed based on 
the Markov chain. The most widely used credible intervals 
include the equal-tail credible interval and the HPD credi¬ 
ble interval. A 100(1 — a) % equal-tail credible interval is 
0 1_ “/ 2 ], where the lower and upper bounds are the 
100a/2th and 100(1 — a/2)th percentiles of the Markov 
chain, respectively. The HPD credible interval is the inter¬ 
val that covers 100(1 — a) % region of the density formed 
by the Markov chain but at the same time has the small¬ 
est interval width. For symmetrical posteriors, the equal-tail 
credible interval is the same as the HPD credible interval. 
For non-symmetrical posteriors, the HPD credible inter¬ 
val has smaller width than the equal-tail credible interval. 
Therefore, HPD is often used for asymmetrical posterior 
distributions. 


Illustration of error distributions 


Potentially, the error e lt can follow any distribution. The 
normal distribution is certainly most widely used. In addi¬ 
tion to the normal distribution, we present three other distri¬ 
butions that can extend GCMs to non-normal data analysis. 
They include the /, exponential power, and skew normal dis¬ 
tributions. It can be seen that the normal distribution is a 
special case of the three distributions. 

Student’s /-distribution 


The /-distribution can be viewed as a generalization of 
the normal distribution. The /-distribution has a kurtosis 
greater than that of the normal distribution. If e follows a /- 
distribution t(e |£ = (/x, (p, k)'), its density function can be 
written as 


r[(* + i)/ 2 ] 

Pt r(k/2) 



(e ~ M ) 2 
k<p 


k +1 

~T~ 


where /x is the location parameter, <p is the scale parame¬ 
ter, and k is the degrees of freedom. The /-distribution has 
longer tails and is more robust to outliers than the normal 
distribution (e.g., Lange et al., 1989; Zhang et al., 2013). 
Therefore, the /-distribution is often used in robust data 
analysis. As the degrees of freedom k goes to infinity, the 
/-distribution approaches the normal distribution. The mean 
of the /-distribution is 


E(e) = /x, k > 1, 
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and the variance is 


Var(e ) = 


- d>, k > 2. 

k - 2 


The t -distribution is symmetric with a skewness value 0. Its 
kurtosis is 


Y2 = 


3k-6 
k-A 


,k> 4. 


Note that when k goes to infinity, yz = 3, the kurtosis of 
the normal distribution. If the error ep ~ t(ep |0, </>, k), then 
yn ~ t(yu - f(time it , b;)|0, 0, k). 

In Fig. 1, we plot the density of the r-distribution with 
the degrees of freedom k = 1,3,6, 30, oo and the same 
location parameter p = 0 and scale parameter 0 = 1. 
When k = oo, its density is the same as the normal distri¬ 
bution. Note that with the increase of degrees of freedom, 
the r-distribution moves closer to the normal distribution. 
Also, it can be observed, on the tails, the r-distribution has 
greater density than the normal distribution and therefore 
allows data to be far away from the center. For real data, k 
can be estimated as an unknown parameter to quantify the 
deviation from normality. 


power distribution comes with different forms. In this study, 
the form by Box and Tiao (1973) is used. If e follows an 
exponential power distribution ep(e |f = (p, cr, y)'), the 
density function can be expressed as 


Pep(e) = a>(y)a 'exp 


~c(y) 




2/d+y)' 


( 2 ) 


where 

{F[3(l + y)/2]}'/ 2 
Y (l + y){r[(l + y)/2]}3/2 


and 


F[3(l + y)/2] ) 1/(1+}/) 
r[d + y)/2] j 


(3) 


(4) 


Flere p is a location parameter, and a is a scale parame¬ 
ter that is also the standard deviation of the population, y 
is a shape parameter that is related to the kurtosis of the 
distribution and characterizes the “non-normality” of the 
population. The mean and variance of the exponential power 
distribution are 


Exponential power distribution 

The r-distribution can model the error ep with kurtosis 
larger, but not smaller, than 3. The exponential power dis¬ 
tribution, also called the generalized error distribution or 
generalized normal distribution, on the other hand, can 
model the error with both positive and negative kurtosis 
(Box & Tiao, 1973; Nadarajah, 2005). The exponential 



e 


Fig. 1 The f-distribution with the location parameter p = 0 and scale 
parameter 0 = 1. The degrees of freedom is 1,3,5,30, and oo, respec¬ 
tively. The kurtosis for k = 1 or k = 3 does not exist, and the kurtosis 
is 9,3.23, and 3 for k = 6, 30, and oo, respectively 


E(e) = p 
Var{e) = a 2 . 

The exponential power distribution is symmetric and there¬ 
fore its skewness is 0. Its kurtosis is 

= F[5(l + y)/2]F[(l + y)/2] 

K2 F[3(l + y)/2] 2 

From the exponential power distribution, we can derive 
many other distributions. For example, if y = 0, the expo¬ 
nential power distribution becomes a normal distribution. If 
y = 1, it becomes a double exponential distribution. Fur¬ 
thermore, when y approaches — 1, it approaches a uniform 
distribution. For growth curve analysis, the error at different 
times can take an exponential power distribution with differ¬ 
ent y values to account for both platykurtic and leptokurtic 
errors simultaneously. If the error ep ~ ep{ep |0, er, y), then 
yu ~ ep(yu - f(time it ,bi) |0, a, y). 

To better illustrate the influence of y, we plot the density 
function of the exponential power distribution with different 
y values in Fig. 2. The five distributions in the figure share 
the same location parameter p = 0 and scale parameter 
cr = l. y takes one of the five values —.75, —.25, 0, .5, 1. 
When y = 0, the distribution becomes a normal distribu¬ 
tion. When y > 0, the exponential power distribution has 
a fatter tail than the normal distribution; when y < 0, it 
has a thinner tail than the normal distribution. Therefore, the 
generalized error distribution can deal with data with both 
fat and thin tails. For real data, y can be estimated as an 
unknown parameter (Box & Tiao, 1973). 
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Fig. 2 The generalized error distribution with p = 0, <p = 1 and 
different values of y. Each value of y is placed on the top of the den¬ 
sity plot. The corresponding kurtosis for y = —.75, —.25, 0, .5, 1 is 
1.92,2.55,3,4.22,6, respectively 


Skew normal distribution 


The kurtosis of the distribution is 


Y2 = 3 + 2(;r - 3) 


(3V27F) 4 

(1 - 2S 2 /jt ) 2 


and is greater than that of the normal distribution. 

To show the influence of a on skewness, we plot the den¬ 
sity function of the skew normal distribution with different 
a values in Fig. 3. For comparison, we maintained the same 
location parameter p = 0 and scale parameter co = I. The 
shape parameter a takes value —4, —2, 0, 2, 4. From the 
figure, when a = 0, the skew normal distribution becomes 
a normal distribution and its density is symmetric. If a > 0, 
the distribution is right skewed and if a <0, the distribu¬ 
tion is left skewed. For real data, a can be estimated as an 
unknown parameter. If the error e, t ~ sn(e, t \p, co, cp), then 
yit ~ epiyt, — f(timeu, bi)\p, co, (p). Because we often 
assume the mean of error is zero, we can reparameterize the 
model parameter p so that p — —coS^/2/jt. 

The error distributions discussed here can be plugged into 
the posterior in Eq. 1. The Bayesian method can then be 
used to obtain model parameter estimates. We now explain 
how to estimate GCMs with different error distributions 
using the MCMC procedure of SAS. 


Both the t -distribution and the exponential power distribu¬ 
tion are symmetric and, therefore, may not be applied if 
the error distribution is asymmetric. A skew normal dis¬ 
tribution captures potential departure from symmetry by 
allowing for skewness. A skew normal distribution can also 
be viewed as a generalization of the normal distribution 
(Azzalini, 1985). If e follows a skew normal distribution 
sn(e |£ = (/x, co, a)'), its density function can be written as 

Psn(e) = -Ip fa-—-) 

where p is a location parameter, o> is a scale parameter, and 
a is a shape parameter that determines the skewness of the 
distribution, cp and <t> are the standard normal density func¬ 
tion and cumulative standard normal distribution function, 
respectively. The mean and variance of the skew normal 
distribution are 


E{e) = p + coS 


and 


Var(e ) = co z I 1- 

jr 


where 8 = 


Va+“ 2 ) 


. Furthermore, its skewness is 


Model estimation using the MCMC procedure 

GCMs with different error distributions can be estimated 
using the MCMC procedure of SAS. We illustrate the use of 
the MCMC procedure in estimating GCMs with the normal 
distribution, /-distribution, exponential power distribution, 



e 


4 — 7T (S^W) 3 

2 (1 — 2<5 2 /tt) 3/2 ' 


Fig. 3 The skew normal distribution with p = 0 and co = 1 and differ¬ 
ent values of a. The corresponding skewness for a = —4. —2, 0, 2, 4 
is —.78, —45, 0, .45, .78, respectively 
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and skew normal distribution. The code can be found from 
our website at http://psychstat.org/brm2015. 

Growth curve model with the normal error 

The SAS code using the procedure MCMC for estimat¬ 
ing the normal error model is given in Code Block 1. The 
code on the first line controls the overall data analysis. 
DATA=ndata specifies the data set to use. NBI = 5000 
tells SAS to throw away the first 5,000 iterations as burn- 
in. NMC=4 0000 tells SAS to generate Markov chains with 
another 40,000 iterations for analysis. THIN=2 means that 
every other iteration will be saved for inference. There¬ 
fore, combining with NMC, a total of 20,000 iterations 
are saved. SEED=17 specifies the random number gen¬ 
erator seed. INIT=RANDOM is used to generate the ran¬ 
dom initial values for parameters that are not provided 
with initial values. DIAG= (ESS GEWEKE) requests the 
calculation of the effective sample size and the Geweke 
test statistic. STATISTICS (ALPHA=0.05) = (SUMMARY 
INTERVAL) requests the calculation of the summary and 
interval statistics for each parameter at the alpha level 0.05. 
OUTPOST=histnorm will save the generated Markov 
chains into a SAS data set named histnorm, which can be 
processed further if needed. 

Line 2 uses the SAS output delivery system (ODS) to 
output nicely arranged tables and figures. PARAMETERS 
requests the output of the sampling method, prior dis¬ 
tribution, and initial value information for each parame¬ 
ter. POSTSUMMARIES requests the output of the mean, 
standard deviation, and 25 %, 50 %, and 75 % per¬ 
centiles, and POSTINTERVALS requests the output of 
the 95 % percentile and HPD credible intervals of 
model parameters. GEWEKE and ESS request the out¬ 
put of the Geweke test and the effective sample size. 
TAD PANEL will generate the trace, autocorrelation and den¬ 
sity plots in one figure for visually inspecting Markov chain 
convergence. 

The keyword ARRAY specifies an array. For example, 
b [ 2 ] on Line 3 is an array with two elements and the first 
element is L and the second element is S. Sigma Jo [2,2] 
on Line 5 specifies a 2 by 2 array. If the values of an array 
are known, they can be provided as fixed values. For exam¬ 
ple, betaO [2] on Line 6 has two elements with values 0 
and V [2,2 ] on Line 8 is an identity matrix. 

The keyword PARMS specifies each block of parame¬ 
ters and their initial values. For example, on Line 9, beta 
is treated as one block of parameters with the initial val¬ 
ues {5,2}. Note that the initial values are enclosed by 
{} for vectors and arrays. The keyword PRIOR specifies 
the prior distribution for the model parameters. For exam¬ 
ple, a multivariate normal prior (MVN) is used for beta 
on Line 12. Note that betaO and sigmaO are given in 


the ARRAY statement. Furthermore, Sigma Jd is given an 
inverse Wishart prior on Line 13 and var_e is given an 
inverse Gamma prior on Line 14. 

The random coefficients b are augmented with model 
parameters in the estimation. The distribution of b is a 
multivariate (bivariate specifically) normal distribution and 
is specified as RANDOM b ~ MVN (beta, SigmaJo) 
SUBJECT=id on Line 15 where id is a variable that 
distinguishes each individual. 

Since the error e,- ? follows a normal distribution, y,g 
also has a normal distribution with the mean b t q + but 
and the same variance. Therefore, the data are mod¬ 
eled using a normal distribution as shown in MODEL 
y ~ NORMAL (mu, var=var_e) on Line 17 with 
the mean is calculated by mu = L + S * time on 
Line 16. 

Growth curve model with the t error 

The SAS code for the linear GCM with the t error is 
given in Code Block 2. The code is very similar to that 
for the normal error except that the data y are modeled 
by the r-distribution y ~ t (mu, var=var_e, df) 
with the unknown degrees of freedom df as shown on 
Line 19. Furthermore, the df is given a uniform prior 
UNIFORM (1,500) on Line 16. 

Growth curve model with the exponential power error 

The SAS code for the linear GCM with the exponential 
power error is given in Code Block 3. The code is similar 
to that for the normal error and the t error. In addition to 
the extra parameters on y (gl, g2, g3, g4 on Lines 12-15), 
a notable difference is that the data are given a general 
distribution with argument 11 on Line 26. This is because 
there is no built-in exponential power distribution in SAS. 
However, a new distribution can be constructed flexibly. 
Comparing 11 on Line 24 and the exponential power den¬ 
sity function in Eq. 2, the use of a new distribution is clear 
and the new distribution is defined as the logarithm density 
function. Another difference is that MONITOR= (_PARMS_ 
var_e) is used on Line 1. MONITOR lists model parame¬ 
ters or newly created parameters with Markov chains to be 
saved. _PARMS_ represents all parameters directly used in a 
model as specified by the keyword PARMS. New parameter 
can be calculated. For example, var_e is the variance cal¬ 
culated based on the standard deviation sigma e on Line 
25. 

Growth curve model with the skew normal error 

The SAS code for the linear GCM with the skew normal 
error is given in Code Block 4. The difference between 
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1 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 


Code Bock 1 SAS code for the GCM with the normal errort _ 

PROC MCMC DATA=ndata NBI=5000 NMC=40000 THIN=2 SEED=17 INIT= 
RANDOM DIAG=(ESS GEWEKE) STATISTICS(ALPHA=0.05)=(SUMMARY 
INTERVAL) OUTPOST=histnorm; 

ODS SELECT PARAMETERS POSTSUMMARIES POSTINTERVALS GEWEKE ESS 
TADPANEL; 

ARRAY b[2] L S; 

ARRAY beta[2]; 

ARRAY Sigma_b[2,2]; 

ARRAY beta0[2] (0 0); 

ARRAY sigma0[2,2] (1000 0 0 1000); 

ARRAY V[2,2] (10 0 1); 

PARMS beta {5 2}; 

PARMS sigma_b {1 0 0 1}; 

PARMS var_e 1; 

PRIOR beta ~ MVN(beta0, sigmaO); 

PRIOR Sigma_b ~ IWISH(2, V); 

PRIOR var_e ~ IGAMMA(0.001, SCALE=0.001); 

RANDOM b ~ MVN(beta, Sigma_b) SUBJECT=id; 
mu = L + S * time; 

MODEL y ~ NORMAL(mu, var=var_e); 

RUN; 


the current code and previous one is that the error e, 
instead of the data y, is modeled directly as indicated 
by MODEL e ~ general (11) on Line 23. The rea¬ 
son to model e is because the mean of the skew normal 
distribution is not “centered” around /i in the sense of 
the normal distribution. Directly modeling the data y 
will make the estimates of /? not comparable with those 
from using the other distributions. Therefore, we “trick” 
the MCMC procedure by specifying a skew normal dis¬ 
tribution with a mean —a>8^/2fjt as indicted by xi = 
-sigma_e*alpha/sqrt (l+alpha*alpha) *sqrt 
(2/3.1415926) on Line 21. 

An example 

The Early Childhood Longitudinal Study, Kindergarten 
Class of 1998-99 (ECLS-K) focuses on children’s early 
school experiences from kindergarten to middle school. The 
ECLS-K began in the fall of 1998 with a nationally repre¬ 
sentative sample of 21,260 kindergartners. The ECLS-K is a 
longitudinal study with data collected in the fall and spring 
of kindergarten (1998-99), the fall and spring of 1st grade 
(1999-2000), the spring of 3rd grade (2002), the spring of 
5th grade (2004), and the spring of 8th grade (2007). Data on 
children’s home environment, home educational activities, 


school achievement, school environment, classroom envi¬ 
ronment, classroom curriculum, and teacher qualifications 
are available. 

Typical longitudinal studies do not have such a large sam¬ 
ple size. Therefore, we randomly selected a sample of 563 
(about 2.5 %) children with complete data on mathemati¬ 
cal ability in the fall and spring of kindergarten and the 1 st 
grade. Summary statistics for this data set are provided in 
Table 1 and the longitudinal plot for all data is presented 
in Fig. 4. Both the summary statistics and the longitudinal 
plot suggested a linear growth pattern. 1 Based on the skew¬ 
ness, the data appear to be symmetric. However, based on 
the kurtosis, only the data at time 1 seem to be normally dis¬ 
tributed. The data distribution is platykurtic at time 2 and 3 
and leptokurtic at time 4. 

In order to diagnose the error distribution, we fitted a 
linear regression to the data from each individual. The resid¬ 
ual errors from the regression analysis were pooled together 
for normality test. Specifically, the D’Agostino skewness 
test and Anscombe-Glynn kurtosis test were conducted (See 

1 A more rigorous method is to fit different models to the data to select 
the best fit one. However, it is not clear how to compare different 
models in the current framework. Instead, the no growth model, lin¬ 
ear growth model, and the quadratic growth model were fitted to the 
data through MLE based on the normal error and it was found that the 
linear model fitted best overall. 
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Code Bock 2 SAS code for the growth curve model with the t error 


1 PROC MCMC DATA=ndata NMC=40000 NBI=5000 THIN=2 DIC SEED=17 INIT= 

RANDOM DIAG=(ESS GEWEKE) STATISTICS(ALPHA=0.05)=(SUMMARY 
INTERVAL) OUTPOST=histt; 

2 ODS SELECT PARAMETERS POSTSUMMARIES POSTINTERVALS GEWEKE ESS 

TADPANEL; 

3 ARRAY b[2] L S; 

4 ARRAY beta[2]; 

5 ARRAY Sigma_b[2,2]; 

6 ARRAY beta0[2] (0 0); 

7 ARRAY sigma0[2,2] (1000 0 0 1000); 

8 ARRAY V[2,2] (10 0 1); 

9 PARMS beta {5 2}; 

10 PARMS sigma_b {1 0 0 1}; 

11 PARMS var_e 1; 

12 PARMS df 5; 

13 PRIOR beta ~ MVN(beta0, sigmaO); 

14 PRIOR Sigma_b ~ IWISH(2, V); 

15 PRIOR var_e ~ IGAMMA(0.001, SCALE=0.001); 

16 PRIOR df-UNIFORM(1,500); 

17 RANDOM b ~ MVN(beta, Sigma_b) SUBJECT=id; 

18 mu = L + S * time; 

19 MODEL y ~ t(mu, var=var_e, df); 

20 RUN; 


Table 1 Summary statistics of the example data 


Time 

Mean 

Variance 

Skewness 

Kurtosis 

1 (Fall, kindergarten) 

4.62 

8.58 

0.69 

3.22 

2 (Spring, kindergarten) 

7.45 

11.11 

0.04 

2.58 

3 (Fall, 1st year) 

9.00 

10.96 

-0.22 

2.74 

4 (Spring, 1st year) 

11.95 

8.59 

-1.00 

4.27 


Table 2; Anscombe & Glynn, 1983; D’Agostino, 1970). 
First, the skewness was -0.06 and the kurtosis was 3.77 for 
the overall error distribution. The D’Agostino test indicates 
that the error distribution was symmetric but the Anscombe- 
Glynn test showed that the kurtosis of the error distribution 
was greater than that of the normal distribution. Then, the 
error distribution was checked at each measurement time. 
The error distribution at each time seemed to be symmet¬ 
ric, too. However, the distribution at time 2 was platykurtic 
and at time 4 was leptokurtic. Therefore, the error distribu¬ 
tion seems to be best modeled with the exponential power 
distribution at each time. 

The linear GCM with the exponential power error distri¬ 
bution was fitted to the data. The trace plots for the model 
parameters indicate that the Markov chains for all param¬ 
eters appeared to converge well after a burn-in period of 


5,000 iterations. Furthermore, all Markov chains passed the 
Geweke test as shown in Table 3. The effective sample sizes 
ranged from 448 to 9791 for model parameters. Parameter 



Fig. 4 Longitudinal plot of the mathematical data of the ECLS-K 
sample 
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Code Bock 3 SAS code for the growth curve model with the exponential power error 


1 PROC MCMC DATA=ndata NMC=100000 NBI=5000 THIN=2 MONITOR=(_PARMS_ 

var_e) DIC SEED=13 INIT=RANDOM DIAG=(ESS GEWEKE(F1=.2 F2=5)) 

S TATIS TICS (ALPH A=0. 05 )=(S UMM AR Y INTERVAL) OUTPOST=histged; 

2 ODS SELECT PARAMETERS POSTSUMMARIES POSTINTERVALS GEWEKE ESS 

TADPANEL; 

3 ARRAY b[2] L S; 

4 ARRAY beta[2]; 

5 ARRAY Sigma_b[2,2]; 

6 ARRAY beta0[2] (0 0); 

7 ARRAY sigma0[2,2] (1000 0 0 1000); 

8 ARRAY V[2,2] (10 0 1); 

9 PARMS beta {52}; 


10 PARMS sigma_b {1 0 0 1}; 

11 PARMS sigma_e 1; 

12 PARMS gl 0; 

13 PARMS g2 0; 

14 PARMS g3 0; 

15 PARMS g4 0; 

16 PRIOR beta ~ MVN(beta0, sigmaO); 

17 PRIOR Sigma_b ~ IWISH(2, V); 

18 PRIOR sigma_e ~ IGAMMA(0.01, SCALE=0.01); 

19 PRIOR g:~UNIFORM(-.6,5); 

20 RANDOM b ~ MVN(beta, Sigma_b) SUBJECT=id; 

21 mu = L + S * time; 

22 g = gl*(time=l)+ g2*(time=2)+g3*(time=3)+g4*(time=4); 

23 p = 1+g; 

24 11 = -LOG(sigma_e) + .5*LGAMMA(1.5*p)-1.5*LGAMMA(p/2)-log(p) - ( 

abs(y-mu)/sigma_e) ** (2/p) * (GAMMA(1.5*p)/GAMMA(.5*p)) ** (1/p); 

25 var_e = sigma_e*sigma_e; 

26 MODEL y ~ general(ll); 

27 RUN; 


Table 2 Tests of skewness and kurtosis for errors from the individual growth curve analysis 


Time 

D'Agostino skewness test 


Anscombe-Glynn kurtosis test 


Skewness 

z 

p-value 

Kurtosis 

z 

p-value 

1 

-0.08 

-0.49 

0.625 

3.19 

1.02 

0.309 

2 

0.18 

1.18 

0.237 

2.65 

-1.97 

0.048 

3 

-0.18 

-1.15 

0.249 

3.34 

1.60 

0.112 

4 

0.18 

1.18 

0.237 

3.79 

2.99 

0.003 

Overall 

-0.06 

-0.78 

0.435 

3.77 

5.41 

0.000 


For the D’Agostino skewness test, the null hypothesis is skewness = 0 and for the Anscombe-Glynn test, the null hypothesis is kurtosis = 3 
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Code Bock 4 SAS code for the growth curve model with the skew normal error 


1 PROC MCMC DATA=ndata NMC=40000 NBI=5000 THIN=2 DIC SEED=17 INIT= 

RANDOM DIAG=(ESS GEWEKE) STATISTICS(ALPHA=0.05)=(SUMMARY 
INTERVAL) MONITOR=(_PARMS_ var_e) OUTPOST=histsn; 

2 ODS SELECT PARAMETERS POSTSUMMARIES POSTINTERVALS GEWEKE ESS 

TADPANEL; 

3 ARRAY b[2] L S; 

4 ARRAY beta[2]; 

5 ARRAY Sigma_b[2,2]; 

6 ARRAY beta0[2] (0 0); 

7 ARRAY sigma0[2,2] (1000 0 0 1000); 

8 ARRAY V[2,2] (10 0 1); 

9 PARMS beta {5 2}; 

10 PARMS sigma_b {1 0 0 1}; 

11 PARMS sigma_e 1; 

12 PARMS alpha 0; 

13 PRIOR beta ~ MVN(beta0, sigmaO); 

14 PRIOR Sigma_b ~ IWISH(2, V); 

15 PRIOR sigma_e ~ IGAMMA(0.01, SCALE=0.01); 

16 PRIOR alpha~UNIFORM(-.5,.5); 

17 RANDOM b ~ MVN(beta, Sigma_b) SUBJECT=id; 

18 mu = L + S * time; 

19 var_e = sigma_e*sigma_e; 

20 e = y - mu; 

21 xi = -sigma_e*alpha/sqrt(l+alpha*alpha)*sqrt(2/3.1415926); 

22 11 = log(2)-log(sigma_e)+logpdf(’normal’,(e-xi)/sigma_e,0,1)+ 

logcdf( ’ normal’ ,alpha*(e-xi)/sigma_e,0,1); 

23 MODEL e ~ general(ll); 

24 RUN; 


Table 3 Results from the linear growth curve model with the exponential power error distribution 



Estimates 

SD 

Est/SD 

Equal-tail Cl 

HPDCI 


Geweke 


ESS 

2.5 % 

97. % 

2.5 % 

97.% 

statistic 

p-value 

00 

2.354 

0.148 

15.934 

2.063 

2.645 

2.064 

2.646 

-0.291 

0.771 

9791 

01 

2.364 

0.035 

66.966 

2.295 

2.434 

2.295 

2.433 

0.011 

0.992 

3964 

°o 

8.911 

0.762 

11.702 

7.494 

10.479 

7.434 

10.412 

0.715 

0.475 

3164 

C01 

-0.654 

0.156 

-4.198 

-0.972 

-0.363 

-0.962 

-0.354 

-0.770 

0.441 

1160 

a l 

0.242 

0.047 

5.106 

0.154 

0.339 

0.150 

0.335 

0.950 

0.342 

672 

a e 

2.444 

0.110 

22.237 

2.236 

2.666 

2.232 

2.660 

-1.011 

0.312 

2394 

Y l 

0.599 

0.490 

1.224 

-0.238 

1.712 

-0.289 

1.658 

-0.264 

0.792 

448 

V2 

-0.196 

0.161 

-1.220 

-0.491 

0.137 

-0.500 

0.126 

1.802 

0.072 

1831 

Y3 

0.091 

0.164 

0.555 

-0.216 

0.435 

-0.238 

0.406 

0.937 

0.349 

2313 

Y4 

0.649 

0.317 

2.048 

0.104 

1.360 

0.078 

1.321 

-1.384 

0.167 

861 
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estimates for the model are given in Table 3. Based on the 
estimates of y t , the errors at time 1, 3, and 4 had kurtoses 
greater than 3 (positive y) while the errors at time 2 had 
kurtosis smaller than 3 (negative /). Furthermore, based on 
the credible intervals, yn, is statistically significant indicat¬ 
ing that the error distribution at time 4 cannot be treated as 
normal. Based on the parameter estimates, the average inter¬ 
cept was 2.354 with an average rate of growth 2.364 in the 
mathematical ability. There were also significant individual 
differences in both the intercept and rate of growth. Further¬ 
more, children with a greater intercept would show a lower 
rate of growth based on the covariance estimate. 

Influence of error distribution misspecification 


is 0. For the exponential power distribution, the parameter 
y is set at different values at different measurement times. 
Three different sample sizes with N = 100, 300, 500 and 
two different measurement occasions with T = 4, 5 are 
investigated. The parameters y = (—0.8, —0.8, 0, 0.8, 0.8) 
for T = 5 and y = (—0.8, —0.8, 0.8, 0.8) for T = 4. Under 
each condition, 100 sets of data are generated and analyzed. 

To estimate the model with the normal, t, exponen¬ 
tial power, and skew normal distributions, the Bayesian 
method is used and implemented in the MCMC procedure 
of SAS as discussed earlier. For the bootstrap method, the 
same data are analyzed using Mplus and 1000 bootstrap 
samples are used. R code for data generation, running sim¬ 
ulation, as well as Mplus script can be found online at 
http://psychstat.org/brm2015. 


We have demonstrated that we can model error distribu¬ 
tions explicitly through Bayesian methods. If the errors are 
not normally distributed but the normal based method is 
used, the error distribution is misspecified. On the other 
hand, if the errors are normally distribution while a non¬ 
normal distribution is used, the error distribution is equally 
misspecified. In this section, we evaluate the influence of 
error distribution misspecification under the two situations 
through a simulation study. In the literature, the boot¬ 
strap method is often used to obtain the standard errors 
for model parameters when data are suspected of non¬ 
normality. Therefore, we also compare our methods using 
the non-normal distributions with the bootstrap method. 

Data generation 

Date are generated from the following linear GCM, 

yit — + but + e,t 

bio = Po + vio 

bi i = Pi + vi i 

where Pq = 5 and p\ =2. The random effects fol¬ 
low a bivariate normal distribution with mean zeros and 
Var(Vio ) = CTq = 2, Var(yn) = oq 2 = 1, and 

Var(vio, iHi) = eroi = 0. 2 To evaluate the influence of 
misspecifying normal error to non-normal error, e, t is set to 
follow a normal distribution with mean 0 and variance a 2 = 
1. To evaluate the influence of misspecifying non-normal 
error to normal error, ei t is set to follow a f-distribution 
with £ = (/i, <f>, k) = (0, 1/3,3), an exponential power 
distribution with £ = (ji, er, y) = (0, 1, y), and a skew nor¬ 
mal distribution with £ = (/x, co, a) = c(— 1.31, 1.64, 10), 
respectively. Note that the parameters of the skew normal 
distribution are chosen so that the mean of the distribution 

2 The population parameter values were chosen rather arbitrarily. But 
we do not expect that the choice of the values affects the simulation 
conclusions. 


Misspecifying non-normal error as normal error 


For the linear GCM with the normal error, t error, expo¬ 
nential power error, and skew normal error, the parameters 
/So, Pi, <Jq , eroi, and <r 2 are comparable and of the greatest 
interest to users. Figure 5 displays the scatter plots of param¬ 
eter estimates and their standard errors for the 5 parameters 
when modeling the error distribution as the normal and as 
non-normal distributions with N = 500 and T = 5 across 
all 100 replications of data analysis. The scatterplots of the 
parameter estimates on the left panel of Fig. 5 clearly show 
that when the non-normal error distributions were misspeci¬ 
fied as a normal distribution, there was not much difference 
in terms of parameter estimates. This is evidenced by the 
fact that the scatterplots nicely fall on the straight diagonal 
line. 

Flowever, the right panel of Fig. 5 shows that the standard 
errors became greater if the non-normal error distributions 
were misspecified as normal distribution because in the 
scatterplots, the majority individual points fall under the 
straight line. This means that the standard errors were over¬ 
estimated with the incorrect error distributions. To further 
quantify the overestimation, we calculate a statistic called 
reduced efficiency in standard error ( RESE ) 3 defined as 


RESE = 100 x 


, too 

— y 
100 ^ 


SVr (bw) 


- 1 


r =1 se r (0c) 


where se, (0 U) ) is the standard error estimates of the parame¬ 
ter estimate 6 W from the wrongly specified error distribution 

and se r (0 C ) is the corresponding one with the correctly spec¬ 
ified error distribution for the rth replication of data. Note 


3 One can also define similar statistic based on mean squared error. The 
pattern of the results is similar and can be obtain from the author along 
with the parameter estimates, standard error estimates, HPD interval, 
Geweke statistic, and effective sample size for each replication. 
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Fig. 5 Parameter estimates and 
their standard errors when the f, 
exponential power, and skew 
normal distributions are, 
respectively, misspecified as 
normal distribution from top to 
bottom 


Estimates 


Standard errors 



Normal 




Normal 


Normal 


that if RESE > 0, it indicates a loss of efficiency; other¬ 
wise, there is a gain in the efficiency. 4 The RMSE for each 
parameter is given in Fig. 5 for the condition N = 500 and 


4 Given that the standard error is generally inversely proportional to the 
square root of the sample size, RESE = 5 % indicates an increase 
of about 10 % sample size and RESE = 10% indicates an increase 
of about 20 % sample size to achieve the same efficiency. Therefore, 
RESE = 5 % should not be ignored and RESE = 10 % can be 
considered as substantial. 


T — 5 and all of them are greater than 0, indicating the loss 
of efficiency when the error distribution is misspecified. 

Table 4 summarizes the RESE when the non-normal 
error distributions are misspecified as the normal distribu¬ 
tion with different sample sizes and measurement occasions. 
Although the RESE varies, the overall patterns are the 
same. First, under all studies condition, there shows a 
reduced efficiency. Second, the random-effects parameters 
ctq and fT ( jj are influenced the most by the misspecification 
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Table 4 RESE when the non-normal error distributions are misspecified as normal (in percentage) 

Sample Size ( N ) 

t 



Exponential Power 


Skew Normal 


100 

300 

500 

100 

300 

500 

100 

300 

500 

Po 

7.25 

8.52 

8.18 

0.64 

1.97 

2.29 

2.43 

2.54 

2.78 

Pi 

2.04 

2.95 

2.89 

0.61 

0.59 

0.53 

0.94 

1.04 

1.11 

T=4 

14.47 

20.87 

19.54 

8.45 

6.59 

8.02 

11.29 

10.49 

11.97 

obi 

10.07 

14.68 

14.24 

4.12 

3.99 

4.96 

7.37 

7.49 

8.22 


4.39 

6.50 

6.23 

0.88 

1.26 

1.55 

2.55 

2.78 

3.02 

Po 

5.61 

7.77 

6.38 

0.89 

1.86 

1.90 

2.79 

3.39 

3.29 

Pi 

1.21 

1.92 

1.47 

0.11 

0.40 

0.33 

0.91 

1.03 

0.81 

T=5 

12.57 

19.34 

14.28 

1.04 

5.51 

5.39 

10.60 

11.01 

10.66 

a 0\ 

7.40 

9.66 

9.27 

0.78 

2.79 

3.05 

6.32 

7.11 

6.76 


2.59 

2.77 

3.21 

0.22 

0.87 

0.83 

2.01 

2.10 

1.97 


of the error distributions.Third, there is no clear pattern of 
the relationship between RESE and sample sizes as well as 
measurement occasions. 

Table 5 summarizes the RESE to compare the boot¬ 
strap method and the Bayesian method with the correctly 
specified error distributions. If the RESE is negative, the 
bootstrap method is more efficient. Otherwise, the Bayesian 
method is more efficient. The results show that for /So. erp, 
and ctj 2 , the Bayesian method is consistently more efficient. 
For /Si and eroi, the results are mixed. Overall, it seems that 
the Bayesian method is more efficient than the bootstrap 
method. 


Misspecifying normal error as non-normal error 

We have shown that misspecifying a non-normal error dis¬ 
tribution to the normal distribution can reduce the efficiency 
of standard error estimates. Now, we evaluate the situa¬ 
tion where the normal error distribution is misspecified as 
a non-normal distribution. The normal distribution can be 
viewed as a special case of the (-distribution, exponen¬ 
tial power distribution, and skew normal distribution. If the 
errors are normally distributed but the three non-normal dis¬ 
tributions are used in the model, the error distribution is 
over-specified, more precisely than misspecified. Therefore, 


Table 5 RESE comparing the bootstrap standard errors with the Bayesian standard deviations using correctly specified error distributions (in 
percentage) 


Sample Size ( N ) 


t 



Exponential Power 


Skew Normal 


100 

300 

500 

100 300 

500 

100 300 

500 



Po 

11.20 

10.56 

10.22 

5.33 

5.07 

3.62 

5.60 

4.62 

5.90 


Pi 

-0.14 

1.24 

1.90 

0.13 

1.98 

0.30 

-0.51 

0.97 

1.16 

T=4 

°o 

24.12 

44.16 

42.35 

3.79 

8.20 

8.72 

9.17 

12.29 

16.09 


Coi 

10.20 

22.01 

20.66 

-0.72 

4.22 

3.75 

-1.58 

4.11 

5.18 



27.01 

43.72 

42.50 

4.97 

9.79 

8.99 

9.21 

12.77 

15.28 


Po 

14.52 

14.62 

14.01 

9.89 

10.01 

9.96 

11.28 

11.04 

11.29 


Pi 

-0.59 

1.39 

1.47 

-1.50 

1.06 

0.35 

-0.98 

-0.63 

0.38 

T=5 

°0 

28.53 

43.93 

42.56 

12.28 

20.66 

21.08 

21.43 

25.37 

26.94 


Coi 

-2.85 

7.62 

7.19 

-1.73 

-1.97 

0.41 

-2.26 

-1.88 

0.52 



20.46 

33.74 

33.83 

10.39 

18.27 

18.41 

16.11 

19.79 

21.12 
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Fig. 6 Parameter estimates and 
their standard errors when the 
normal error distribution is 
misspecified as the t, 
exponential power, and skew 
normal distributions, 
respectively, from top to bottom 


Estimates 


Standard errors 



Normal 


Normal 


one would expect this kind of misspecification should not 
have a big influence on the parameter estimates and their 
standard errors. 

Figure 6 shows scatterplots of the parameter estimates 
and their standard errors when modeling the error distribu¬ 
tion as both the normal and non-normal distributions with 
N = 500 and T = 5 across all 100 replications of data 
analysis. Clearly, the parameter estimates and their standard 


errors were very similar when the error was modeled as 
both the normal and non-normal distributions. Table 6 sum¬ 
marizes the RESE for all investigated conditions. Across 
the table, RESE is mostly smaller than 0.5 %. Compared 
to RESE in Table 4, the loss in efficiency in the standard 
error estimates is negligible when the normal error distri¬ 
bution was misspecified as any of the three non-normal 
distributions discussed here. 
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Table 6 RESE when the normal error distribution is misspecified as non-normal (in percentage) 

Sample Size (N) 

t 



Exponential Power 


Skew Normal 


100 

300 

500 

100 

300 

500 

100 

300 

500 

Po 

0.20 

-0.12 

-0.08 

0.43 

0.12 

0.07 

0.58 

-0.04 

-0.09 

Pi 

-0.01 

0.03 

0.14 

0.22 

-0.01 

0.10 

0.29 

-0.06 

-0.07 

T=4 

0.17 

-0.15 

-0.05 

-0.11 

0.19 

0.53 

1.20 

-0.11 

0.14 

cot 

0.10 

-0.09 

-0.09 

0.35 

0.05 

0.30 

0.60 

0.16 

0.14 

a l 

-0.08 

0.02 

-0.11 

0.23 

0.12 

0.27 

0.33 

0.16 

0.03 

P 0 

0.09 

0.05 

0.05 

0.39 

0.16 

0.15 

0.20 

0.05 

-0.10 

Pi 

-0.07 

0.04 

-0.06 

0.01 

-0.01 

0.03 

0.07 

0.07 

-0.05 

T=5 

0.14 

0.00 

0.25 

0.38 

0.31 

0.32 

0.62 

0.01 

-0.03 

O"0l 

0.16 

-0.06 

-0.09 

0.16 

0.22 

0.17 

0.31 

-0.03 

0.05 


0.07 

0.13 

-0.14 

0.01 

0.12 

-0.01 

0.02 

-0.02 

0.00 


Summary 

When the error distributions were misspecified, there was 
no noticeable influence on the parameter estimates of the 
GCMs. If the normal error distribution was misspecified 
as one of the three non-normal distributions discussed in 
the study, its influence on the standard error estimates was 
also negligible. However, if the non-normal error distribu¬ 
tions were misspecified as normal distributions, the loss in 
the efficiency of the standard error estimates can be large. 
In addition, by correctly specifying the error distribution, 
one can obtain more efficient results than the bootstrap 
method. 

Discussion 

Growth curve models have been widely used in social and 
behavioral sciences. However, typical GCMs often assume 
that the errors are normally distributed. The assumption 
is further carried by both commercial and free software 
for growth curve analysis and makes it less flexible for 
non-normal data analysis although non-normal data may 
be even more common than normal data (Micceri, 1989). 
As already shown by the literature, blindly assuming nor¬ 
mality may lead to severe statistical inference problems 
(Yuan et al., 2004; Yuan & Zhang, 2012a). 

In this study, we proposed a general Bayesian framework 
to flexibly model normal and non-normal data through the 
explicit specification of the error distributions. Within the 
framework, the error distribution is first explored through 
exploratory data analysis techniques. Then, a statistical dis¬ 
tribution that best matches the error distribution is used 
in growth curve modeling. Bayesian methods are then 


used to estimate the model parameters. Although we have 
focused on the discussion of the f-distribution, exponen¬ 
tial power distribution, and skew normal distribution, almost 
any distribution can be applied to the error in growth curve 
analysis. 

Through a simulation study, we found that when the nor¬ 
mal error distribution was misspecified as non-normal error 
distributions discussed in this study, both parameter esti¬ 
mates and their standard errors were very comparable. How¬ 
ever, when the non-normal error distributions were misspec¬ 
ified as normal distribution, the loss in the efficiency of the 
standard error estimates can be large. Therefore, it seemed 
that moving from normal distribution to those that closely 
resemble the data is compelling. The similar conclusion is 
also found in Zhang et al. (2013). 

We demonstrated how to conduct growth curve anal¬ 
ysis with both normal and non-normal error distributions 
practically using the MCMC procedure. The normal dis¬ 
tribution and r-distribution are built-in distributions of the 
MCMC procedure. Therefore, the extension of the growth 
curve model with the normal error in the MLE frame¬ 
work to the Bayesian framework is almost effortless. It is 
also logically natural to extend the models with the nor¬ 
mal error to the ones with the t error. Although there 
are no built-in functions for exponential power distribution 
and skew normal distribution, the MCMC procedure allows 
the specification of new distributions. From our demon¬ 
stration, potentially any distribution with known density 
function can be used to model the error of a GCM. Overall, 
Bayesian growth curve modeling seems to offer great flex¬ 
ibility in longitudinal data analysis. Although GCMs with 
non-normal error distributions potentially can be estimated 
in the MLE framework, we are not aware of easy to use 
software to carry out such analysis. 
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For demonstration purposes, we have focused our dis¬ 
cussion on the linear GCMs. However, the methods 
proposed can be equally applied to the nonlinear GCMs 
(e.g., Grimm et ah, 2011). The nonlinear model esti¬ 
mation can also be conducted using the the MCMC 
procedure. 

Given the fact that standard error estimates, not param¬ 
eter estimates, are influenced by distribution misspecifica- 
tion, one might wonder why not use the MLE with robust 
standard error or bootstrap standard error (e.g., Yuan & 
Hayashi, 2006). Based on the likelihood theory, the MLE 
is most efficient when the distribution is correctly specified 
(e.g., Casella & Berger, 2001). Asymptotically, Bayesian 
method is equivalent to MLE especially with uninforma¬ 
tive priors (e.g., Gelman et ah, 2003). Therefore, one would 
still expect the Bayesian method will provide more effi¬ 
cient standard error estimates than the robust or bootstrap 
method based on the misspecified normal error distribution. 
Our simulation study actually supported this in finding that, 
overall, modeling the error distribution is more efficient 
than the bootstrap method. 

The current study can be extended in different ways. 
Firstly, in determining the error distribution, the individ¬ 
ual growth curve analysis is used. However, the method 
assumes that the growth function is known a priori. In 
practice, this might not be possible. Although it is easy to 
distinguish a linear growth from a non-linear growth, it can 
be difficult to choose among several potential non-linear 
forms. Therefore, more rigorous method can and should be 
developed to determine the best growth function. Secondly, 
in exploring the error distribution, it is assumed that the 
error would follow a certain known distribution. It is likely 
that one cannot find a good distribution to approximate the 
error distribution from the individual growth analysis. One 
potential solution is to use a non-parametric error distri¬ 
bution as discussed by Tong and Zhang (2014). Thirdly, 
although it is shown that when a normal distribution is mis¬ 
specified as a non-normal distribution, the loss of efficiency 
is minimum, it is not clearly what will happen if the error 
distribution is completely misspecified. For example, what 
will happen if a skewed normal distribution is misspecified 
as an exponential power distribution with negative kurtosis. 
Fourthly, the current study did not discuss how to evalu¬ 
ate a model fit and how to conduct model selection with 
the new error distributions. Evaluation of a model fit can 
inform whether a model fits the data adequately and model 
selection can help decide among more than one compet¬ 
ing models. Especially, model selection can also help the 
choice of the growth function forms and the error distri¬ 
butions. Many criteria for model evaluation are available 
(e.g., Lu et ah, 2015; Spiegelhalter et al., 2002) and can 
be potentially applied for the current study. Fifthly, the cur¬ 
rent study assumes that the error can be non-normal but 


the random coefficients are still normal. There exist multi¬ 
variate counterparts for the t , exponential power, and skew 
normal distributions. However, we are only aware of the 
extension of the multivariate f-distribution to the model the 
random coefficients (Tong & Zhang, 2012). Although the 
above concerns are out of the scope of the current study, 
that is to introduce a new way to model error distribu¬ 
tions in growth curve analysis, they will be investigated in 
future studies. 
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